{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gemma-2-2b-it Demo\n",
    "\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/safety-research/circuit-tracer/blob/main/demos/gemma_it_demo.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>\n",
    "\n",
    "We have no transcoders for gemma-2-2b-it, the instruction-tuned version of gemma-2-2b. However, [past work](https://www.alignmentforum.org/posts/fmwk6qxrpW8d4jvbd/saes-usually-transfer-between-base-and-chat-models) suggests that sparse autoencoders transfer reasonably well from base to fine-tuned models. Is the same true for transcoders in the context of circuit-finding? We will apply the gemma-2-2b transcoders to gemma-2-2b, on just a couple of examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [`User:\\nWho are you? Please answer in pirate-speak.\\nModel\\nA` → `hoy`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-it-pirate&clerps=%5B%5D&pinnedIds=27_65226_21%2C22_9019_21%2C25_10358_21%2C25_1746_21%2C24_2656_21%2C18_15126_21%2C23_8519_21%2C21_136_21%2C23_9011_21%2C20_15037_21%2C19_11298_21%2C17_3874_21%2C20_15899_21%2C15_9318_12%2C14_10408_12%2C19_1504_21%2C17_13969_21%2C19_5584_21%2C14_10179_12%2C4_12527_12%2C6_3848_12%2C11_4036_12%2C6_1601_12%2C4_8481_12%2CE_3_12%2C21_10451_21&supernodes=%5B%5B%22say+h*%22%2C%2225_10358_21%22%2C%2224_2656_21%22%2C%2225_1746_21%22%2C%2220_15037_21%22%2C%2223_8519_21%22%5D%2C%5B%22exclamations%22%2C%2223_9011_21%22%2C%2219_11298_21%22%2C%2218_15126_21%22%2C%2220_15899_21%22%5D%2C%5B%22boats+%2F+ships%22%2C%2221_136_21%22%2C%2222_9019_21%22%2C%2219_5584_21%22%2C%2219_1504_21%22%5D%2C%5B%22nautical+things%22%2C%224_12527_12%22%2C%226_3848_12%22%2C%2214_10408_12%22%2C%2215_9318_12%22%5D%2C%5B%22theft%2C+adventure%22%2C%2214_10179_12%22%2C%2211_4036_12%22%2C%226_1601_12%22%2C%224_8481_12%22%5D%2C%5B%22pirates%22%2C%2221_10451_21%22%2C%2217_13969_21%22%5D%5D&clickedId=21_10451_21)\n",
    "- [`User:\\nGive me a word that rhymes with rabbit.\\nModel\\nA word that rhymes with rabbit is **` → `habit`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-it-rabbit-habit&clerps=%5B%5B%221701364%22%2C%22word+lists+%28start+with+same+letter%2C+rhyming%2C+anagrams%29%22%5D%2C%5B%222415029%22%2C%22predict+token+ending+with+b%22%5D%2C%5B%222306272%22%2C%22predict+iva+%2F+*ev%22%5D%2C%5B%222107273%22%2C%22predict+%28acronym%29+ending+with+B%22%5D%2C%5B%221609921%22%2C%22token+ending+with+b%22%5D%2C%5B%221511048%22%2C%22token+ending+with+b%22%5D%2C%5B%222201250%22%2C%22predict+token+starting+with+ab%22%5D%2C%5B%222401512%22%2C%22predict+token+ending+with+t%22%5D%2C%5B%222316058%22%2C%22predict+token+ending+with+T%22%5D%2C%5B%221706889%22%2C%22token+ending+with+t%22%5D%2C%5B%221603089%22%2C%22token+ending+with+*ab+%28or+containing+it%29%22%5D%2C%5B%221404646%22%2C%22animals%22%5D%2C%5B%221605010%22%2C%22words+%2F+pronunciations%22%5D%2C%5B%221414813%22%2C%22grammar%22%5D%2C%5B%221304461%22%2C%22spelling+%2F+pronunciation%22%5D%2C%5B%22601975%22%2C%22start+%2F+begin+with%22%5D%2C%5B%22309585%22%2C%22%28mis%29spell%22%5D%2C%5B%22600578%22%2C%22poet%28ry%29%22%5D%2C%5B%22411389%22%2C%22the+arts%22%5D%2C%5B%22909694%22%2C%22anagrams+%2F+wordplay%22%5D%2C%5B%221610279%22%2C%22%28rhyming%29+word+lists%22%5D%2C%5B%221402820%22%2C%22word+lists+%28start+with+same+letter%2C+rhyming%2C+anagrams%29%22%5D%5D&pinnedIds=27_16831_25%2C24_15029_25%2C24_1512_25%2C23_6272_25%2C21_7273_25%2C23_6272_14%2C16_9921_11%2C22_1250_25%2C21_7273_14%2C17_1364_25%2C14_2820_25%2C16_10279_25%2C16_5010_25%2C13_4461_25%2C14_14813_25%2C16_9921_23%2C15_11048_11%2C23_16058_25%2C23_16058_14%2C17_6889_11%2C16_3089_11%2C14_4646_11%2CE_30358_11%2C6_1975_9%2C9_9694_10%2C6_1975_10%2C6_578_9%2CE_105878_9%2CE_675_10%2C3_9585_10%2C4_11389_9%2C13_4461_14&supernodes=%5B%5B%22token+ending+with+b%22%2C%2216_9921_23%22%2C%2216_9921_11%22%2C%2215_11048_11%22%5D%2C%5B%22predict+token+ending+with+t%22%2C%2224_1512_25%22%2C%2217_6889_11%22%2C%2223_16058_14%22%2C%2223_16058_25%22%5D%2C%5B%22start+%2F+begin+with%22%2C%226_1975_10%22%2C%226_1975_9%22%5D%2C%5B%22word+lists+%28start+with+same+letter%2C+rhyming%2C+anagrams%29%22%2C%2214_2820_25%22%2C%2217_1364_25%22%2C%2216_10279_25%22%5D%2C%5B%22poet%28ry%29%22%2C%226_578_9%22%2C%224_11389_9%22%5D%2C%5B%22spelling+%2F+pronunciation%22%2C%2213_4461_25%22%2C%2214_14813_25%22%2C%2216_5010_25%22%5D%2C%5B%22predict+%28acronym%29+ending+with+B%22%2C%2221_7273_25%22%2C%2224_15029_25%22%5D%5D&clickedId=6_1975_9)\n",
    "\n",
    "For brevity, we've omitted the extra formatting tokens used with instruction-tuned models in both examples; the latter, for example, is really formatted as `<start_of_turn>user\\nGive me a word that rhymes with rabbit.<end_of_turn>\\n<start_of_turn>model\\nA word that rhymes with rabbit is **` → `habit`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#@title Colab Setup Environment\n",
    "\n",
    "try:\n",
    "    import google.colab\n",
    "    !mkdir -p repository && cd repository && \\\n",
    "     git clone https://github.com/safety-research/circuit-tracer && \\\n",
    "     curl -LsSf https://astral.sh/uv/install.sh | sh && \\\n",
    "     uv pip install -e circuit-tracer/\n",
    "\n",
    "    import sys\n",
    "    from huggingface_hub import notebook_login\n",
    "    sys.path.append('repository/circuit-tracer')\n",
    "    sys.path.append('repository/circuit-tracer/demos')\n",
    "    notebook_login(new_session=False)\n",
    "    IN_COLAB = True\n",
    "except ImportError:\n",
    "    IN_COLAB = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Union, List\n",
    "from collections import namedtuple\n",
    "\n",
    "import torch\n",
    "\n",
    "from circuit_tracer.replacement_model import ReplacementModel\n",
    "from utils import display_generations_comparison\n",
    "from circuit_tracer.utils.decode_url_features import decode_url_features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a7f72aa2eb4a4c58b7c2cdf128150607",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 26 files:   0%|          | 0/26 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0b0e9a5ff0ef43a0bb50e11722752b45",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded pretrained model google/gemma-2-2b-it into HookedTransformer\n"
     ]
    }
   ],
   "source": [
    "model = ReplacementModel.from_pretrained(\"google/gemma-2-2b-it\", \"gemma\", dtype=torch.bfloat16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll create a convenience function for turning our inputs into a chat-based format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "Feature = namedtuple('Feature', ['layer', 'pos', 'feature_idx'])\n",
    "\n",
    "def chattify(inputs:Union[str, List[str]], generate:bool=False):\n",
    "    if isinstance(inputs, str):\n",
    "        inputs = [inputs]\n",
    "    input_list = []\n",
    "    for i, s in enumerate(inputs):\n",
    "        role = 'user' if i % 2 == 0 else 'assistant'\n",
    "        input_list.append({'role': role, 'content':s})\n",
    "    chattified = model.tokenizer.apply_chat_template(input_list, tokenize=False, add_generation_prompt=role!='assistant')\n",
    "    if role == 'assistant':\n",
    "        chattified = chattified[:-14]\n",
    "    if not generate:\n",
    "        # remove bos\n",
    "        chattified =  chattified[5:]\n",
    "    return chattified"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll look into an example where we ask Gemma 2 (2B, Instruction-Tuned) to talk like a pirate! Here's the circuit: [`User:\\nWho are you? Please answer in pirate-speak.\\nModel\\nA` → `hoy`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-it-pirate&clerps=%5B%5D&pinnedIds=27_65226_21%2C22_9019_21%2C25_10358_21%2C25_1746_21%2C24_2656_21%2C18_15126_21%2C23_8519_21%2C21_136_21%2C23_9011_21%2C20_15037_21%2C19_11298_21%2C17_3874_21%2C20_15899_21%2C15_9318_12%2C14_10408_12%2C19_1504_21%2C17_13969_21%2C19_5584_21%2C14_10179_12%2C4_12527_12%2C6_3848_12%2C11_4036_12%2C6_1601_12%2C4_8481_12%2CE_3_12%2C21_10451_21&supernodes=%5B%5B%22say+h*%22%2C%2225_10358_21%22%2C%2224_2656_21%22%2C%2225_1746_21%22%2C%2220_15037_21%22%2C%2223_8519_21%22%5D%2C%5B%22exclamations%22%2C%2223_9011_21%22%2C%2219_11298_21%22%2C%2218_15126_21%22%2C%2220_15899_21%22%5D%2C%5B%22boats+%2F+ships%22%2C%2221_136_21%22%2C%2222_9019_21%22%2C%2219_5584_21%22%2C%2219_1504_21%22%5D%2C%5B%22nautical+things%22%2C%224_12527_12%22%2C%226_3848_12%22%2C%2214_10408_12%22%2C%2215_9318_12%22%5D%2C%5B%22theft%2C+adventure%22%2C%2214_10179_12%22%2C%2211_4036_12%22%2C%226_1601_12%22%2C%224_8481_12%22%5D%2C%5B%22pirates%22%2C%2221_10451_21%22%2C%2217_13969_21%22%5D%5D&clickedId=21_10451_21)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma-it/pirate.png\" width=\"400\" />\n",
    "\n",
    "We can see that there are a number of features relating to greetings, the sea, and crime, as well as two pirate features. Can we use these pirate features to steer Gemma to talk like a pirate more generally? First, we'll extract them from the URL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "pirate_url = 'https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-it-pirate&clerps=%5B%5D&pinnedIds=27_65226_21%2C22_9019_21%2C25_10358_21%2C25_1746_21%2C24_2656_21%2C18_15126_21%2C23_8519_21%2C21_136_21%2C23_9011_21%2C20_15037_21%2C19_11298_21%2C17_3874_21%2C20_15899_21%2C15_9318_12%2C14_10408_12%2C19_1504_21%2C17_13969_21%2C19_5584_21%2C14_10179_12%2C4_12527_12%2C6_3848_12%2C11_4036_12%2C6_1601_12%2C4_8481_12%2CE_3_12%2C21_10451_21&supernodes=%5B%5B%22say+h*%22%2C%2225_10358_21%22%2C%2224_2656_21%22%2C%2225_1746_21%22%2C%2220_15037_21%22%2C%2223_8519_21%22%5D%2C%5B%22exclamations%22%2C%2223_9011_21%22%2C%2219_11298_21%22%2C%2218_15126_21%22%2C%2220_15899_21%22%5D%2C%5B%22boats+%2F+ships%22%2C%2221_136_21%22%2C%2222_9019_21%22%2C%2219_5584_21%22%2C%2219_1504_21%22%5D%2C%5B%22nautical+things%22%2C%224_12527_12%22%2C%226_3848_12%22%2C%2214_10408_12%22%2C%2215_9318_12%22%5D%2C%5B%22theft%2C+adventure%22%2C%2214_10179_12%22%2C%2211_4036_12%22%2C%226_1601_12%22%2C%224_8481_12%22%5D%2C%5B%22pirates%22%2C%2221_10451_21%22%2C%2217_13969_21%22%5D%5D&clickedId=21_10451_21'\n",
    "supernodes, _= decode_url_features(pirate_url)\n",
    "pirate_features = supernodes['pirates']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we'll construct the relevant interventions, setting each pirate feature on a normal prompt to 6 times its value on the pirate prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_pirate = [\"Who are you? Please, answer in pirate-speak.\", \"A\"]\n",
    "_, pirate_activations = model.get_activations(chattify(s_pirate), sparse=False)\n",
    "# open-ended generation intervention\n",
    "interventions = [(pirate_feature.layer, slice(1, None), pirate_feature.feature_idx, 6 * pirate_activations[pirate_feature]) \n",
    "                 for pirate_feature in pirate_features]\n",
    "\n",
    "s_normal = ['Tell me a story.']\n",
    "hooks, _, _ = model._get_feature_intervention_hooks(chattify(s_normal), interventions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we'll sample 5 times from the model, with and without interventions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .generations-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 12px;\n",
       "        font-size: 13px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .generations-viz .section-header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin: 10px 0 5px 0;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: block;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-header {\n",
       "        background-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-header {\n",
       "        background-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-container {\n",
       "        margin-bottom: 8px;\n",
       "        padding: 3px;\n",
       "        border-left: 3px solid rgba(100, 100, 100, 0.5);\n",
       "    }\n",
       "    .generations-viz .generation-text {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 6px 8px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        font-weight: 500;\n",
       "        white-space: pre-wrap;\n",
       "        line-height: 1.2;\n",
       "        font-size: 13px;\n",
       "        overflow-x: auto;\n",
       "    }\n",
       "    .generations-viz .base-text {\n",
       "        color: rgba(100, 100, 100, 0.9);\n",
       "    }\n",
       "    .generations-viz .new-text {\n",
       "        background-color: rgba(255, 255, 0, 0.25);\n",
       "        font-weight: bold;\n",
       "        padding: 1px 0;\n",
       "        border-radius: 2px;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-item {\n",
       "        border-left-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-item {\n",
       "        border-left-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-number {\n",
       "        font-weight: bold;\n",
       "        margin-bottom: 3px;\n",
       "        color: rgba(70, 70, 70, 0.9);\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"generations-viz\">\n",
       "    \n",
       "    <div class=\"section-header pre-intervention-header\">Pre-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The air shimmered with heat and the scent of salt hung thick over the abandoned pier.  El</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 2</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The old lighthouse keeper, Elias, stood on the rocky cliff, his weathered face crevice holding a</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 3</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The salt spray stung Elara&#x27;s eyes as she hauled the fishing net, heavy with ethereal glow</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 4</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The old lighthouse keeper&#x27;s beard was white as snowdrift, and his eyes watched the churning mist</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 5</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The old bookstore smelled of woodsmoke and forgotten dreams. Lila, book cashier and chronicler of impossible</div>\n",
       "        </div>\n",
       "        \n",
       "    <div class=\"section-header post-intervention-header\">Post-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The wind whipped through Maisyl&#x27;s bristling red hair, carrying a &quot;swash a-</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 2</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The air was thick with the smell of salt spray and the cry of seagulls.  Elara</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 3</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The wind howled a mournful song through the skeletal branches, whipping scarlet leaves across the cobblestones</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 4</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The old crataceous map clutched in Millie&#x27;s calloused hand felt hot and heavy. It</div>\n",
       "        </div>\n",
       "        \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 5</div>\n",
       "            <div class=\"generation-text\">user\n",
       "Tell me a story.\n",
       "model\n",
       "The salty air whipped Maera&#x27;s woolen scarf across her weathered cheeks.  The wind held the</div>\n",
       "        </div>\n",
       "        \n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "num_samples = 5\n",
    "pre_intervention_generation = [model.generate(chattify(s_normal, generate=True), do_sample=True, use_past_kv_cache=False, verbose=False, max_new_tokens=20) for _ in range(num_samples)]\n",
    "post_intervention_generation = [model.feature_intervention_generate(chattify(s_normal, generate=True), interventions, do_sample=True, verbose=False, max_new_tokens=20)[0] for _ in range(num_samples)]\n",
    "display_generations_comparison(chattify(s_normal, generate=True), pre_intervention_generation, post_intervention_generation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This intervention elicits pirate-related stories from the model!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've also created a circuit for the prompt [`User:\\nGive me a word that rhymes with rabbit.\\nModel\\nA word that rhymes with rabbit is **` → `habit`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-it-rabbit-habit&clerps=%5B%5B%221701364%22%2C%22word+lists+%28start+with+same+letter%2C+rhyming%2C+anagrams%29%22%5D%2C%5B%222415029%22%2C%22predict+token+ending+with+b%22%5D%2C%5B%222306272%22%2C%22predict+iva+%2F+*ev%22%5D%2C%5B%222107273%22%2C%22predict+%28acronym%29+ending+with+B%22%5D%2C%5B%221609921%22%2C%22token+ending+with+b%22%5D%2C%5B%221511048%22%2C%22token+ending+with+b%22%5D%2C%5B%222201250%22%2C%22predict+token+starting+with+ab%22%5D%2C%5B%222401512%22%2C%22predict+token+ending+with+t%22%5D%2C%5B%222316058%22%2C%22predict+token+ending+with+T%22%5D%2C%5B%221706889%22%2C%22token+ending+with+t%22%5D%2C%5B%221603089%22%2C%22token+ending+with+*ab+%28or+containing+it%29%22%5D%2C%5B%221404646%22%2C%22animals%22%5D%2C%5B%221605010%22%2C%22words+%2F+pronunciations%22%5D%2C%5B%221414813%22%2C%22grammar%22%5D%2C%5B%221304461%22%2C%22spelling+%2F+pronunciation%22%5D%2C%5B%22601975%22%2C%22start+%2F+begin+with%22%5D%2C%5B%22309585%22%2C%22%28mis%29spell%22%5D%2C%5B%22600578%22%2C%22poet%28ry%29%22%5D%2C%5B%22411389%22%2C%22the+arts%22%5D%2C%5B%22909694%22%2C%22anagrams+%2F+wordplay%22%5D%2C%5B%221610279%22%2C%22%28rhyming%29+word+lists%22%5D%2C%5B%221402820%22%2C%22word+lists+%28start+with+same+letter%2C+rhyming%2C+anagrams%29%22%5D%5D&pinnedIds=27_16831_25%2C24_15029_25%2C24_1512_25%2C23_6272_25%2C21_7273_25%2C23_6272_14%2C16_9921_11%2C22_1250_25%2C21_7273_14%2C17_1364_25%2C14_2820_25%2C16_10279_25%2C16_5010_25%2C13_4461_25%2C14_14813_25%2C16_9921_23%2C15_11048_11%2C23_16058_25%2C23_16058_14%2C17_6889_11%2C16_3089_11%2C14_4646_11%2CE_30358_11%2C6_1975_9%2C9_9694_10%2C6_1975_10%2C6_578_9%2CE_105878_9%2CE_675_10%2C3_9585_10%2C4_11389_9%2C13_4461_14&supernodes=%5B%5B%22token+ending+with+b%22%2C%2216_9921_23%22%2C%2216_9921_11%22%2C%2215_11048_11%22%5D%2C%5B%22predict+token+ending+with+t%22%2C%2224_1512_25%22%2C%2217_6889_11%22%2C%2223_16058_14%22%2C%2223_16058_25%22%5D%2C%5B%22start+%2F+begin+with%22%2C%226_1975_10%22%2C%226_1975_9%22%5D%2C%5B%22word+lists+%28start+with+same+letter%2C+rhyming%2C+anagrams%29%22%2C%2214_2820_25%22%2C%2217_1364_25%22%2C%2216_10279_25%22%5D%2C%5B%22poet%28ry%29%22%2C%226_578_9%22%2C%224_11389_9%22%5D%2C%5B%22spelling+%2F+pronunciation%22%2C%2213_4461_25%22%2C%2214_14813_25%22%2C%2216_5010_25%22%5D%2C%5B%22predict+%28acronym%29+ending+with+B%22%2C%2221_7273_25%22%2C%2224_15029_25%22%5D%5D&clickedId=6_1975_9), but we haven't performed any interventions yet. Can you perform fine-grained interventions to steer the model's output word?\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma-it/rabbit-habit.png\" width=\"400\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
